Capturing Heterogeneity in Gene Expression Studies by “ Surrogate Variable Analysis ” Jeffrey

نویسندگان

  • Jeffrey T. Leek
  • John D. Storey
چکیده

The false discovery rate (FDR) has been discussed extensively and it has been pointed out that the distribution of the null p-values must be “correct” or conservative for FDR estimation or any other standard statistical significance measure to behave properly. What is meant for distribution of the null p-values to be correct is that they are Uniformly distributed in the interval (0,1). The null p-values are have a conservative distribution or they are pushed towards 1 relative to the Uniform(0,1). P-values are constructed to have the Uniform distribution property under the null hypothesis, and if this cannot be done exactly the conservative version is calculated [1]. In a simulation study where the right answer is known, there is no off-the-shelf approach to test whether the null p-values have a proper distribution. In this study, we use a Kolmogorov-Smirnov (KS) test on the set of null p-values for deviation from the Uniform. However, we want to test whether this is true over many repeated simulations to avoid “getting lucky” on one particular simulated data set. If the set of null p-values are Uniform, then the p-value resulting from the KS test should also follow the Uniform distribution. Therefore, by examining the KS test p-values over all simulations, we can again apply a KS test to verify that these are Uniformly distributed. Here we have employed this nested KS test to compare the relative behavior of each multiple testing procedure discussed. If the quantiles of the KS test p-values follow the diagonal line in a quantile-quantile plot against the quantiles of the Uniform distribution, then this is very strong evidence that the p-values resulting from the procedure are “correct.”

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Capturing Heterogeneity in Gene Expression Studies by Surrogate Variable Analysis

It has unambiguously been shown that genetic, environmental, demographic, and technical factors may have substantial effects on gene expression levels. In addition to the measured variable(s) of interest, there will tend to be sources of signal due to factors that are unknown, unmeasured, or too complicated to capture through simple models. We show that failing to incorporate these sources of h...

متن کامل

Summary and discussion of: “Capturing Heterogeneity in Gene Expression Studies by Surrogate Variable Analysis”

Gene expression study is well known to focus on finding association between expression levels of particular genes and some interesting variables, for example, a disease state. In such studies, besides the primary variable of interest, some other covariates are usually measured and included in the model of association tests. However, it is not possible to measure all the variables related to gen...

متن کامل

Surrogate variable analysis using partial least squares (SVA-PLS) in gene expression studies

MOTIVATION In a typical gene expression profiling study, our prime objective is to identify the genes that are differentially expressed between the samples from two different tissue types. Commonly, standard analysis of variance (ANOVA)/regression is implemented to identify the relative effects of these genes over the two types of samples from their respective arrays of expression levels. But, ...

متن کامل

Genetic polymorphism and expression analysis of cMBL gene in Iranian native and commercial chickens

The aims of this study were to compare the promoter sequence of the mannose-binding lectin (cMBL) gene in Iranian native and commercial chicken strains; as well as to compare the cMBL gene expression in crossbred and inbred chickens. In total 79 native (Western Azerbaijan native fowls, WANF) and 49 commercial (Arian Commercial Strain, ACS) birds were reared as parents under same management prac...

متن کامل

Preserving biological heterogeneity with a permuted surrogate variable analysis for genomics batch correction

MOTIVATION Sample source, procurement process and other technical variations introduce batch effects into genomics data. Algorithms to remove these artifacts enhance differences between known biological covariates, but also carry potential concern of removing intragroup biological heterogeneity and thus any personalized genomic signatures. As a result, accurate identification of novel subtypes ...

متن کامل

I-43: Identification of SOX3 as an XX MaleSex Reversal Gene in Mice and Jumans

Background: Mammals utilise an XX/XY system of sex determination in which the Y-linked gene SRY (Sexdetermining region Y) exerts a dominant masculinising influence on sexual development. Sex chromosome homology and comparative sequence studies suggest that SRY evolved from the related SOX3 gene on the X chromosome, although there is no direct functional evidence to support this hypothesis. The ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007